AITopics | visual question

ALimitations and Societal

Neural Information Processing SystemsApr-30-2026, 02:28:50 GMT

Limitations One limitation of our model is its potential for data bias. KOSMOS-1 is trained on a2 web-scale multimodal corpus, which means that it is likely to be biased towards the data that it was3 trained on. This could lead to the model generating text that is biased towards certain demographics4 or viewpoints.5 Another limitation of KOSMOS-1 is its relatively small size compared to other large language models.6 This means that the model may not be able to learn as complex relationships between different7 modalities. This could lead to the model making mistakes when it is asked to perform tasks that8 require a deep understanding of multiple modalities.9 Finally, KOSMOS-1 only supports vision modality.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

0c007ebef1d11fd48da6ce4f54687db6-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 17:08:48 GMT

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.69)
Health & Medicine > Nuclear Medicine (0.68)

Technology:

Information Technology > Databases (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Information Management (0.93)
(4 more...)

Add feedback

029df12a9363313c3e41047844ecad94-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 05:58:35 GMT

information retrieval, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County (0.28)

Genre: Workflow (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Neural Information Processing SystemsMar-22-2026, 12:10:19 GMT

Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide array of visual questions, which requires strong perception and reasoning faculties. Assessing these two competencies independently is crucial for model refinement, despite the inherent difficulty due to the intertwined nature of seeing and reasoning in existing VLMs. To tackle this issue, we present Prism, an innovative framework designed to disentangle the perception and reasoning processes involved in visual question solving. Prism comprises two distinct stages: a perception stage that utilizes a VLM to extract and articulate visual information in textual form, and a reasoning stage that formulates responses based on the extracted visual information using a Large Language Model (LLM). This modular design enables the systematic comparison and assessment of both proprietary and open-source VLM for their perception and reasoning strengths. Our analytical framework provides several valuable insights, underscoring Prism's potential as a cost-effective solution for vision-language tasks.By combining a streamlined VLM focused on perception with a powerful LLM tailored for reasoning, Prism achieves superior results in general vision-language tasks while substantially cutting down on training and operational expenses. Quantitative evaluations show that Prism, when configured with a vanilla 2B LLaVA and freely accessible GPT-3.5, delivers performance on par with VLMs $10 \times$ larger on the rigorous multimodal benchmark MMStar.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Understanding Information Storage and Transfer in Multi-Modal Large Language Models

Neural Information Processing SystemsMar-18-2026, 02:13:12 GMT

Understanding the mechanisms of information storage and transfer in Transformer-based models is important for driving model understanding progress. Recent work has studied these mechanisms for Large Language Models (LLMs), revealing insights on how information is stored in a model's parameters and how information flows to and from these parameters in response to specific prompts. However, these studies have not yet been extended to Multi-modal Large Language Models (MLLMs). Given their expanding capabilities and real-world use, we start by studying one aspect of these models -- how MLLMs process information in a factual visual question answering task. We use a constraint-based formulation which views a visual question as having a set of visual or textual constraints that the model's generated answer must satisfy to be correct (e.g.

information, large language model, natural language, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback